Efficient Neural Audio Synthesis

نویسندگان

Nal Kalchbrenner

Erich Elsen

Karen Simonyan

Seb Noury

Norman Casagrande

Edward Lockhart

Florian Stimberg

Aäron van den Oord

Sander Dieleman

Koray Kavukcuoglu

چکیده

Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24 kHz 16-bit audio 4× faster than real time on a GPU. Second, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an orthogonal method for increasing sampling efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Musical Audio Synthesis Using Autoencoding Neural Nets

With an optimal network topology and tuning of hyperparameters, artificial neural networks (ANNs) may be trained to learn a mapping from low level audio features to one or more higher-level representations. Such artificial neural networks are commonly used in classification and regression settings to perform arbitrary tasks. In this work we suggest repurposing autoencoding neural networks as mu...

متن کامل

Synthesis of nickel ferrite nanoparticles as an efficient magnetic sorbent for removal of an azo-dye: Response surface methodology and neural network modeling

In this research, nickel ferrite (NiFe2O4) nanoparticles (NFNs) are prepared through coprecipitation method, and applied for adsorption removal of a model organic pollutant, methyl orange (MO). The characterization of t...

متن کامل

Speaker-independent 3D face synthesis driven by speech and text

In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...

متن کامل

Char2wav: End-to-end Speech Synthesis

We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoderdecoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. Neural vocode...

متن کامل

A Neural Network Principal Component Synthesizer for Expressive Control of Musical Sounds

This dissertation introduces a connectionist model that maps perceptual controllers to synthesis parameters to allow for an intuitive and powerful musical control of audio synthesis. This model, or system, allows the extraction, abstraction, reproduction and transformation of relevant features of a musician's style. All the information is deduced exclusively from audio. No prior knowledge of th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.08435 شماره

صفحات -

تاریخ انتشار 2018

Efficient Neural Audio Synthesis

نویسندگان

چکیده

منابع مشابه

Musical Audio Synthesis Using Autoencoding Neural Nets

Synthesis of nickel ferrite nanoparticles as an efficient magnetic sorbent for removal of an azo-dye: Response surface methodology and neural network modeling

Speaker-independent 3D face synthesis driven by speech and text

Char2wav: End-to-end Speech Synthesis

A Neural Network Principal Component Synthesizer for Expressive Control of Musical Sounds

عنوان ژورنال:

اشتراک گذاری